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SUMMARY 


This  final  report  describes  research  on  stochastic  and  adaptive  systems  by 
faculty  and  students  of  the  Decision  and  Control  Sciences  Group  of  the  M.I.T.  Lab 
oratory  for  Information  and  Decision  Systems  (formerly  Electronic  Systems  Labora¬ 
tory)  with  support  provided  by  the  United  States  Air  Force  Office  of  Scientific 
Research  under  Grant  AFOSR  77-3281B.  The  Grant  Monitor  was  Charles  L.  Nefzger, 
Major,  USAF.  The  time  period  covered  by  this  report  is  February  1,  1979  to  Janu¬ 
ary  31,  1980. 

Substantial  progress  is  reported  in  the  areas  of  nonlinear  filtering,  stoch¬ 
astic  control,  adaptive  control  and  stochastic  adaptive  control. 


-1- 


-2- 


Introduction 

The  research  we  have  conducted  over  the  past  several  years  and  in  particular 
during  the  period  February  1,  1979  to  January  31,  1980  has  been  concerned  with 
fundamental  aspects  of  controlling  linear  and  non-linear  stochastic  systems  in 
the  presence  of  measurement  and  parameter  uncertainties.  In  case  the  uncertainties 
reside  in  the  state  description  of  the  physical  system  and  measurements  then  we  refer 
to  the  control  problem  as  a  stochastic  control  problem.  If  in  addition  there  are  par¬ 
ameter  uncertainties  then  the  problem  is  referred  to  as  an  adaptive  control  problem. 
This  is  because  in  addition  to  state  estimation  some  form  of  parameter  identification 
scheme  will  be  needed  and  almost  always  the  control  and  estimation-identification 
functions  will  interact  in  a  non-trivial  way.  The  simplest  such  example  would  be  to 
try  to  control  a  plant  which  can  be  described  by  an  integrator  with  an  unknown  gain 
and  it  is  necessary  to  control  this  plant  in  the  presence  of  additive  white  gaussian 
noise  uncertainties  in  the  state  and  measurement  process.  The  performance  of  the 
controller  is  judged  by  setting  up  a  quadratic  performance  criterion. 

A  subproblem  of  the  stochastic  and  adaptive  control  problem  is  the  state  esti¬ 
mation  problem.  Suppose  for  a  moment  we  make  the  assumption  that  there  are  no 
parameter  uncertainties  present  in  the  dynamical  description  of  the  state  of  the 
system.  Then  all  the  probabilistic  information  that  one  can  extract  about  the 
"state"  of  the  system  on  the  basis  of  noisy  measurements  of  the  state  is  contained 
in  the  conditional  probability  density  of  the  state  given  the  observations.  Indeed 
this  is  the  probabilistic  state  of  the  joint  physical -measurement  system.  The  re¬ 
cursive  computation  of  the  probabilistic  stau'  is  the  state  estimation  problem. 

If  this  could  be  done,  then  one  could  look  for  the  best  controller  as  a  function  of 
this  probabilistic  state,  best  being  judged  in  terms  of  a  suitable  performance  criterioi 

During  the  current  grant  period  we  have  made  important  progress  on  many  aspects 
of  the  overall  problem  we  have  discussed  above. 
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This  report  is  divided  into  three  main  sections: 

(i)  Linear  and  Non-linear  State  Estimation 

(ii)  Stochastic  Control 

(iii)  Adaptive  Control 

(iv)  Stochastic  Adaptive  Control  , 

corresponding  to  the  subdivision  of  the  control  of  uncertain  systems  we  have  made 
above. 

This  work  was  carried  out  under  the  joint  direction  of  Professors  M.  Athans 
and  S.K.  Mitter.  They  were  assisted  by  Dr.  L.  Valavani,  Professor  John  Baras 
(visiting  from  University  of  Maryland  and  partially  supported  by  the  grant),  Mr.  D. 
Ocone  (research  assistant),  Mr.  T.  Pappas  (research  assistant)  and  Mr.  L.  Vallot 
(research  assistant) . 
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2.  Linear  and  Non-linear  State  Estimation 

A  general  non-linear  state  estimation  problem  can  be  described  as  follows: 

Let  the  state  of  the  stochastic  system  have  a  dynamical  description 

(1)  xt  =  xQ  +  J*  f(xg)ds  +  /*  g(xs)dws  ,  where  f  and  g  are  sufficiently 

smooth  functions  and  dwg  is  f  white  gauss ian  noise,  and  let  the  measurement  equa¬ 
tion  be 

(2)  yt  =  /q  h(xs)ds  +  nt 

where  n  is  the  integral  of  white  Gaussian  noise  and  h  is  also  a  smooth  function. 
Both  the  processes  x  and  y  can  be  vector  processes  in  which  case  f  is  a  vector¬ 
valued  function  and  g  is  matrix-valued. 

The  problem  of  non-linear  state  estimation  is  to  recursively  (in  real-time) 
compute  the  conditional  density  p(xtlys  ,  0£s_<  t).  The  most  celebrated  special 
case  of  this  problem  is  when  the  functions  f,  g,  h  are  linear  in  x  and  this  problem 
was  completely  solved  by  Kalman.  In  this  case  the  conditional  density  is  Gaussian 
and  hence  can  be  parametrized  by  its  mean  and  covariance  and  differential  equations 
for  the  evolution  of  the  mean  and  covariance  functions  in  time  can  be  obtained. 
Furthermore  under  appropriate  hypotheses  of  controllability  and  observability  the 
resulting  state  estimator  can  be  shown  to  be  asymptotically  stable. 

An  approach  towards  obtaining  the  solution  to  this  problem  is  the  "innovations 
approach",  first  proposed  by  Bode-Shannon  in  the  early  fifties  and  later  developed 
by  Kailath  and  his  students.  In  this  approach,  one  forms  the  so-called  innovations 
process 

A  A 

(3)  vt  =  yt  -  J*  h(xg)  ds  ,  where  A  denotes  conditional  expectation,  which 
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can  be  shown  to  be  integral  of  white  noise,  and  the  estimator  can  then  be  computed 
based  on  the  innovations  process.  A  conjecture  due  to  Kailath  and  Frost  which 
has  been  open  since  1967  related  to  the  innovations  process  has  recently  been 
solved  by  Professor  Mitter  in  collaboration  with  Dr.  D.  Allinger  of  the  Mathematics 
Department  at  M.I.T.  [1],  The  conjecture  was  to  the  effect  that  the  innovations 
process  contained  the  same  information  as  the  observations  process,  or  in  more 
technical  terms  whether  the  c-field  generated  by  the  observations  equalled  the 
c-field  generated  by  the  innovations. 

For  non-linear  problems,  in  general  the  innovations  process  cannot  be  effect¬ 
ively  computed.  It  turns  out  that  the  correct  object  one  should  try  to  recursively 
compute  is  not  the  conditional  density  p(xt[ys,  0£s<^t),  but  an  unnormalized  form 
of  it.  If  we  denote  this  by  q(t,z,y^)  then  it  can  be  shown  that  q  satisfies  a  bi¬ 
linear  stochastic  partial  differential  equation  which  depends  on  the  observations 
{ys,  0£S£t}  and  not  on  the  innovations.  Furthermore  we  have  been  able  to  find 
classes  of  examples  of  non-linear  estimation  for  which  we  can  explicitly  solve  this 
equation.  The  systematic  study  of  this  equation,  its  robust  form  and  its  analogy 
to  problems  of  quantum  physics  has  been  carried  out  by  Professor  Mitter  jointly 
with  Mr.  Daniel  Ocone  (Ph.D.  student  in  Mathematics  Department  and  supported  by 
this  grant)  and  Professor  John  Baras  of  the  University  of  Maryland  who  visited 
M.I.T.  during  the  fall  term  of  1979  and  was  partially  supported  by  this  grant. 

Details  of  this  work  is  reported  in  [2]  and  several  other  publications  are  in  prep¬ 
aration. 

In  other  work  related  to  non-linear  estimation.  Professor  Mitter  in  conjunction 
with  Mr.  Daniel  Ocone  has  obtained  multiple  integral  expansions  for  representations 

of  conditional  statistics  of  x.,  given  the  observations  (y  |0£s£t).  This  work 
was  reported  in  [3]  and  further  details  can  be  found  in  the  forthcoming  disserta¬ 
tion  of  Daniel  Ocone. 

Finally,  a  new  approach  to  obtain  estimates  better  than  the  linear  minimum 
variance  estimate  for  linear  stochastic  estimation  problems  with  multiplicative  noise 
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is  being  investigated  by  Professor  Mitter  and  Mr,  Lawrence  Vallot  (graduate  student, 
partially  supported  by  this  grant).  This  approach  consists  of  generating  new  ob¬ 
servables  based  on  the  observation  y  and  the  linear  minimum  variance  estimate  and 
improving  the  estimate  based  on  these  new  observables.  This  work  will  be  reported 
in  the  S.M.  Thesis  of  Mr.  L.  Vallot. 

The  work  done  during  the  current  grant  period  promises  to  have  important 
practical  consequences.  For  the  first  time,  a  systematic  approach  to  obtaining 
sub-optimal  estimators  using  perturbation  theory  appears  to  be  feasible.  We  believe, 
this  work  would  also  lead  to  adequate  analysis  of  the  convergence  properties  of  the 
extended  Kalman  filter  and  other  successive  linearization  techniques. 

Most  of  the  major  advances  in  non-linear  estimation  over  the  past  two  to 
three  years  have  been  done  by  the  M.I.T.  group  or  people  who  have  visited  the  M.I.T. 
group  (such  as  J.  Baras,  V.  Benes,  M.H.A.  Davis,  E.  Wong,  E.  Pardoux). 

References  for  Section  2 

1.  D.  A1 linger  and  S.K.  Mitter:  New  Results  on  the  Innovations  Problem  for  Non- 
Linear  Filtering,  LIDS-R-964,  January  1980  (submitted  to  Stochastics). 

2.  S.K.  Mitter:  On  the  Analogy  between  Mathematical  Problems  of  Non-Linear  Filter¬ 
ing  and  Quantum  Physics,  to  appear  in  Ricerche  di  Autonatica,  special  issue  de¬ 
voted  to  System  Theory  and  Physics. 

3.  S.K.  Mitter  and  D.  Ocone:  Multiple  Integral  Expansions  for  Nonlinear  Filtering, 
LIDS-P-943,  September  1979  and  Proceedings  of  the  18th  IEEE  Conference  on  Decision 
and  Control,  Fort  Lauderdale,  Florida,  1979. 
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3.  Stochastic  Control 

Professor  Mitter  assisted  by  Mr.  Thrasyvoulos  Pappas  (research  assistant) 
has  continued  his  work  on  a  geometrical  theory  of  stochastic  control.  This  work 
is  a  generalization  of  the  work  of  W.M.  Wonham  on  a  generalized  theory  of  deter¬ 
ministic  linear  multivariable  control  systems  [1].  An  important  role  is  played 
in  this  theory  by  the  concepts  of  (A,B)  invariant  and  controllability  subspaces. 

The  basic  model  is  that  of  a  linear  multivariable  system  in  state-space  form 
which  is  perturbed  by  additive  white  Gaussian  noise.  In  addition  the  measurements 
are  also  corrupted  by  additive  white  Gaussian  noise  and  it  is  desired  to  regulate 
certain  other  output  variables  by  means  of  linear  constant  feedback  of  the  estimated 
state  of  the  system.  Regulation  here  is  understood  to  mean  that  the  variance  of  the 
variables  to  be  regulated  remains  bounded.  Preliminary  work  on  this  problem  was 
done  by  Snyders  and  Wonham  [2].  They  obtain  certain  sufficient  conditions  for  reg¬ 
ulation  to  be  possible.  These  sufficient  conditions  were  obtained  by  relating  this 
problem  to  the  restricted  regulator  problem. 

Recent  work  by  L.  Shumaker  and  J.C.  Willems  (as  yet  unpublished)  suggests  that 
it  would  be  possible  to  obtain  necessary  and  sufficient  conditions  for  regulation 
based  on  the  concept  of  (C,A)  invariant  subspaces.  Furthermore,  based  on  our  work 
on  state  estimation  it  appears  that  we  can  introduce  a  concept  of  stochastic  obser¬ 
vability  which  will  have  an  important  role  to  play  here. 

Our  work  on  state  estimation  reported  in  the  previous  section  also  has  implica¬ 
tions  in  stochastic  control.  It  appears  that  for  stochastic  control  in  the  presence 
of  measurement  uncertainties  the  control  function  can  always  be  chosen  as  a  feed¬ 
back  function  of  the  (uncontrolled)  unnormalized  conditional  densities. 


•iff 
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References  for  Section  3 

1.  W.M.  Wonham:'  Linear  Multivariable  Control:  and  Geometric  Approach. 
Springer-Verlag,  New  York,  1979. 

2.  J.  Snyders  and  W.M.  Wonham:  Regulation  of  Linear  Stochastic  Systems, 
SIAM  Journal  of  Control,  Vol.  13,  No.  4,  July  1975,  pp.  853-864. 
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4.  Adaptive  Control 

During  the  past  year  the  adaptive  control  research  accelerated,  due  primarily 
to  the  addition  of  Dr.  L.  Valavani  in  the  research  team.  Our  long  range  objective 
is  to  develop  a  methodology  of  design  for  adaptive  control  systems,  by  attempting 
to  unify  promising  concepts  based  upon  hyperstability  theory  and  stochastic  optimal 
control,  respectively,  with  some  common  sense  control  engineering  techniques.  The 
reason  for  our  relative  optimism  is  due  to  the  fact  that  during  the  past  two  years 
a  better  theoretical  understanding  of  "model  reference"  adaptive  control  techniques 
has  occured  together  with  a  unification  of  certain,  hitherto  distinct,  adaptive 
control  methods.  In  addition,  a  more  fundamental  understanding  of  robustness  prop- 
erties  of  multivariable  control  systems  has  taken  place  through  the  use  of  singular 
value  diagrams.  We  feel  that  the  time  may  be  "ripe"  for  significant  advances  in 
the  theory  of  adaptive  control. 

In  the  remainder  of  this  section  we  summarize  our  progress  to  date  in  the 
following  four  areas: 

(a)  Stable  Adaptive  Schemes 

(b)  Convergence  Properties  of  the  Adaptive  Process 

(c)  Stochastic  Adaptive  Control 

(d)  Adaptive  Dual  Control  Studies  . 

Stable  Adaptive  Schemes 

In  recent  years  there  has  been  considerable  activity  in  the  design  of  adaptive 
controllers,  with  special  emphasis  on  their  stability  properties.  Many  different 
adaptive  schemes  were  suggested  and  the  question  of  asymptotic  stability  became 
of  critical  importance,  particularly  since  in  most  of  these  schemes  auxiliary 
feedback  adaptive  signals--which  could  conceivably  become  unbounded — were  used  in 
both  plant  and  reference  model .  More  recently,  global  asymptotic  stability  proofs 
have  become  available  for  some  adaptive  controllers — primarily  those  dealing  with 
deterministic  systems. 


A  detailed  assessment  of  the  various  schemes  proposed  so  far — both  discrete 
and  continuous — is  given  in  a  paper  under  preparation  by  Dr.  Valavani  [l].  It  is 
shown  that  the  stability  proof  of  the  various  schemes  derived  by  different  methods 
can  be  given  in  a  unified  manner,  using  a  generic  model  of  the  error  equations. 

The  paper  also  deals  with  the  generalization  of  this  model  to  a  more  abstract 
setting,  thus  providing  the  necessary  insights  towards  new  adaptive  control  schemes 
with  fewer  restrictive  assumptions.  Further,  the  particular  adaptive  schemes  des¬ 
cribed  in  the  literature  are  derived  by  applying  special  modifications  to  this 
general  model.  Work  on  the  implications  of  such  a  unification  to  optimal  adaptive 
schemes--as  compared  to  the  model  reference  schemes  considered  so  far — is  currently 
in  progress  and  will  be  reported  at  a  later  time. 

Convergence  Properties  of  the  Adaptive  Process 

A  considerable  part  of  our  research  by  Dr.  Valavani,  Professor  Athans,  and 
Mr.  Rohrs  has  been  focussed  on  the  convergence  patterns  exhibited  by  the  various 
adaptive  algorithms.  Our  aim  is  to  obtain  a  good  understanding  of  the  evolution  of 
the  adaptive  process  and,  based  on  this,  to  improve  the  rates  of  convergence,  thus 
making  the  algorithms  feasible  for  practical  applications. 

Earlier  simulation  runs  on  the  digital  computer  had  shown  that,  although  the 
output  errors  tend  to  zero  at  the  end  cf  adaptation,  the  parameters  almost  never 
converge  to  their  "true"  values.  This  is  consistent  with  the  "dual  effect"  which 
becomes  much  more  pronounced  in  the  general  case  of  stochastic  adaptive  problems. 
But  even  in  deterministic  adaptive  control  our  knowledge  of  the  precise  nature  of 
adaptation  is  far  from  satisfactory  at  the  present  time. 

Systematic  simulation  studies  of  a  simple  second  order  example  at  this  stage 
have  shown  some  interesting--rather  encouraging--behavior .  Bode  plots  of  the  con¬ 
trolled  adaptive  system  indicate  that  it  is  able  to  pick  up  the  high  frequency  (of 
the  model)  very  fast  and  track  it  arbitrarily  closely  from  then  on.  This  points  to 
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the  existence  of  an  "optimal"  choice  of  auxiliary  state  variable  filters  within 
this  framework.  Furthermore,  the  almost  identical  behavior  (in  the  Bode  plots)  of 
the  model  and  controlled  plant  at  the  higher — but  not  cutoff — frequency  part  of 
the  spectrum  may  mean  that  the  uncertainty  region  around  the  "nominal"  plot  near 
the  critical  point  (on  the  Nyquist  plane)  is  greatly  reduced.  This  has  obvious 
implications  for  the  stability  properties  of  the  overall  system. 

Work  is  progressing  along  these  lines  in  order  to  quantify  these  regions  of 
uncertainty  during  adaptation.  Optimization  of  the  adaptive  process — in  terms  of 
improved  convergence  characteristics--can  then  be  carried  out  in  the  sense  of  mini¬ 
mizing  the  radii  bounding  uncertainty  around  "nominal"  trajectories.  This  will  ob¬ 
viously  affect  the  choice  of  adaptive  gains  for  improved  performance. 

It  has  been  argued  [2]  that  knowledge  of  the  sign  of  the  high  frequency  gain — 
as  well  as  the  exact  relative  degree  of  the  process--provide  enough  information  for 
the  design  of  a  "heuristic"  controller  with  comparable  performance  but  simpler  im¬ 
plementation.  We  are  presently  trying  to  "capture"  the  effect  of  different  assump¬ 
tions  with  respect  to  the  high  frequency  gain  on  the  adaptation.  Simultaneously, 
the  pattern  of  evolution  of  the  characteristic  roots  (poles)  of  the  controlled 
system  as  well  as  the  transfer  function  configurations  to  which  it  evolves  will 
provide  valuable  information  on  which  to  draw  for  design  improvements. 

Our  simulation  studies  so  far  have  also  shown  that  the  control  input  exhibits 
an  interesting  behavior — particularly  when  a  comparison  is  made  with  the  evolution 
pattern  of  the  characteristic  roots  of  the  plant.  Further,  it  appears  that  the 
adaptive  process  overall  exhibits  a  rather  consistent  pattern  in  terms  of  frequency 
characteristics  and  the  "adaptive"  pole-zero  configuration,  which  admits  to  an 
analytic  description.  Work  is  progressing  in  this  direction  and  the  first  results 
will  be  contained  in  [3]. 


5.  Stochastic  Adaptive  Control 


When  additive  and/or  multiplicative  noise  are  present  in  the  system,  the  entire 
adaptive  control  problem  becomes  very  complex.  To  date,  there  are  no  complete 
proofs  for  its  stability.  In  the  model  reference  and  self-tuning  regulator 
approach  the  disturbance  usually  enters  as  observation  (additive)  noise.  However, 
due  to  feedback  in  the  adaptive  loop  it  also  becomes  multiplicative  and  affects  the 
"estimated"  or  "controlled"  parameter  values.  So  far,  stochastic  adaptive  control¬ 
lers  in  this  case  use  some  ad  hoc  modifications  of  their  deterministic  counterparts, 
motivated  from  the  fact  that  the  effects  of  noise  should  be  ignored  (switched-offl 
in  the  adjustment  of  the  control  parameters  after  some  advanced  stage  is  reached 
in  the  adaptation.  Stability  results  in  this  direction  have  only  been  obtained 
locally  and  have  had  to  rely  on  certain  restrictive  assumptions  concerning  the 
boundedness  (and  smoothness)  of  functions  involved  in  the  proofs. 

Our  approach  to  this  problem  has  been  a  global  one.  We  are  trying  to  formulate 
a  general  framework  within  which  the  adaptive  laws  can  be  chosen  so  as  to  guarantee 
stability  of  the  overall  system  at  the  outset.  Research  has  been  carried  out  in 
an  effort  by  Dr.  Valavani  to  define  a  "stochastic  storage  function"  concept,  motiv¬ 
ated  from  the  corresponding  one  used  in  hyperstability  theory  for  deterministic 
systems.  The  ideas  of  passivity  and  positive  reality  are  intimately  linked  with  this. 
It  is  interesting  to  note  at  this  point  that  Brockett  and  Willems  [4]  were  able  to 
define  a  scalar  constant  quantity  which  they  called  "temperature"  for  stochastic 
systems  whose  deterministic  parts  are  positive  real.  We  expect  to  be  able  to  arrive 
at  a  definition  of  our  "energy  indexing  function"  ("stochastic  storage  functions") 
from  the  description  of  the  "uncertainty  regions"  obtained  during  adaptation  for  the 
stochastic  systems.  Our  results  from  section  4  are  very  encouraging  in  this  direc¬ 
tion.  Given  that  the  deterministic  adaptive  laws  were  derived  from  realizing  posi¬ 
tive  real  transfer  functions  (which  are  strongly  stable)  for  the  error  equations. 
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at  least  in  the  transient  stage,  it  is  not  unreasonable  to  expect  that  the  "uncer¬ 
tainty  regions" — which  will  now  depend  on  noise  parameters  as  well — will  be  con¬ 
tainable  and  rather  well  behaved.  Within  this  framework  we  can  then  choose  our 
adaptive  laws  so  as  to  make  the  "radii"  of  these  "uncertainty  regions"  decrease 
as  adaptation  proceeds.  Work  is  progressing  along  these  lines  and  will  be  reported 
at  a  future  date. 

Adaptive  Dual  Control  Studies 

Most  stochastic  optimal  control  problems  are  not  amenable  to  a  solution 
through  the  stochastic  dynamic  programming  equation.  This  is  so  because  of  the 
"curse  of  dimensionality."  The  need,  therefore,  naturally  arises  for  suboptimal 
algorithms.  Those  suboptimal  algorithms  should,  however,  share  desirable  quali¬ 
tative  features  with  the  optimal  controls.  The  study  of  simple  examples  of  discrete¬ 
time  linear  systems  with  quadratic  cost  and  multiplicative  noise  indicates  two  con¬ 
sequences  of  parameter  uncertainty  on  the  optimal  control  law.  On  the  one  hand, 
the  presence  of  uncertainty  in  the  system  parameters  has  a  stimulating  action  on 
the  optimal  control,  because  a  control  exercised  at  a  given  time  can  improve  the 
accuracy  of  future  parameter  estimates.  This  effect  has  been  called  loosely  the 
probing  effect  of  the  control.  On  the  other  hand,  the  presence  of  uncertainty  which 
cannot  be  reduced  by  the  control  has  an  inhibitory,  loosely  called  the  caution, 
effect  on  the  control;  the  larger  those  irreducible  uncertainties,  the  more  attenu¬ 
ated  the  control  should  be.  None  of  these  consequences  of  uncertainties,  the  so- 
called  dual  effect,  are  captured  by  the  naive  "certainty  equivalent"  (CE)  control 
law,  which  is  obtained  by  setting  all  random  parameters  to  their  a  priori  mean 
values  and  treating  the  system  as  deterministic. 


In  the  more  general  cases,  wide-sense  dual  adaptive  algorithms  have  been 
suggested.  The  crux  of  those  adaptive  algorithms  is  to  approximate  the  cost-to-go 


in  the  dynamic  programming  equation  by  expanding  it  about  a  nominal  trajectory  to 
second-order  terms  in  perturbations  resulting  from  random  disturbances.  The  re¬ 
sulting  cost,  called  the  dual  cost,  is  minimized  to  yield  the  suboptimal  control 
at  the  corresponding  time  stage.  It  has  been  observed  by  simulations  that  the 
algorithms  displayed  the  desirable  caution  and  probing  features.  Moreover,  it  has 
been  claimed  that  the  dual  cost  could  be  decomposed  in  a  sum  of  terms  which  account 
respectively  for  the  caution  effect,  the  probing  effect  and  the  deterministic  part 
of  the  cost. 

In  general,  however,  it  is  impossible  to  compare  such  dual  control  laws  with 
the  optimal  one,  which  is  unknown,  in  the  case  of  constant  but  unknown  parameters. 
During  the  past  year  Mr.  Dersin  and  Professors  Athans  and  Kendrick  [5]  considered 
a  different  special  case  of  a  scalar,  discrete-time  linear  system  with  white  multi¬ 
plicative  gaussian  noise  and  perfectly  observed  state.  The  optimal  control  law  of 
such  systems,  for  a  quadratic  performance  index,  is  known.  We  show  that,  in  that 
special  case,  it  is  possible  to  explicitly  derive  the  dual  cost  and  the  dual  control 
in  closed  form,  when  the  length  of  the  planning  horizon  goes  to  infinity. 

Some  valuable  insight  can  be  obtained,  since  we  show  that  the  asymptotic 
(i.e.,  infinite  horizon)  dual  control  law  is  in  fact  equivalent  to  a  first-order 
expansion  of  the  optimal  control  law  for  systems  with  white  parameters  as  a  function 
of  the  parameter  covariances,  about  the  nominal  value  of  null  parameter  covariances, 
which  corresponds  to  a  deterministic  problem.  Since  the  certainty-equivalent  (CE) 
control  is  simply  a  zero-order  approximation,  the  dual  control  is  shown  to  be  inter¬ 
mediate  (optimal  to  linear  terms)  between  the  CE  and  the  optimal  control.  Hie 
accuracy  of  the  dual  control  law  for  small  parameter  covariances  is  quite  surprising, 
as  no  learning  can  take  place  in  this  problem,  due  to  the  white-noise  parameter 
assumption.  In  other  words,  if  the  system  parameters  have  small  standard  deviations 
about  their  mean  values,  we  demonstrate  by  means  of  a  scalar  example  that  the  dual 
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control  is  (to  first  order  linear  terms  in  the  parameter  standard  deviations) 
identical  to  the  white-parameter  optimal  control  law,  which  involves  no  learning. 

One  can  argue  both  ways  whether  this  is  "good  news  or  bad  news".  The  "good  news" 
is  that  if  the  system  parameters  are  not  very  random,  then  the  inherent  "robust¬ 
ness"  properties  of  feedback,  modulated  correctly  for  parameter  uncertainty,  require 
no  detailed  "learning"  of  the  parameters,  provided  that  certain  "caution"  is  excer- 
cised  (this  is  not  what  the  certainty-equivalence  principle  states).  The  "bad  news" 
is  that  the  dual  control  algorithm  does  not  seem  to  capture  the  required  "caution" 
effects  when  the  system  parameters  are  very  uncertain  and  very  weakly  correlated 
in  time. 
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